{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "13cb272e",
   "metadata": {},
   "source": [
    "# Vector search with LanceDB Cloud and LlamaIndex \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a0e829a",
   "metadata": {
    "id": "wgPbKbpumkhH"
   },
   "source": [
    "### Credentials\n",
    "\n",
    "Copy and paste the project name and the api key from your project page.\n",
    "These will be used later to [connect to LanceDB Cloud](#scroll-to=5q8m6GMD7sGu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6553603f",
   "metadata": {
    "id": "rqEXT5-fmofw"
   },
   "outputs": [],
   "source": [
    "project_slug = \"your-project-slug\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "36ef9c45",
   "metadata": {
    "id": "5LYmBomPmswi"
   },
   "outputs": [],
   "source": [
    "api_key = \"sk_...\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33ba6af1",
   "metadata": {
    "id": "Xs6tr6CMnBrr"
   },
   "source": [
    "You can also set the LANCEDB_API_KEY as an environment variable. More details can be found <a href=\"https://github.com/lancedb/vectordb-recipes/tree/main/examples/RAG_Reranking/lancedb_cloud/README.md\">**here**</a>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Le27BWs2vDbB"
   },
   "source": [
    "Since we will be using OPENAI API, let us set the OPENAI API KEY as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-2-fyVPKu9fl"
   },
   "outputs": [],
   "source": [
    "openai_api_key = \"sk-...\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1991331f-4316-417a-b693-e2f27cbe9ea7",
   "metadata": {},
   "source": [
    "### Installing dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8a49c31",
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install llama-index-vector-stores-lancedb llama-index-readers-file llama-index-embeddings-openai llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0QQL4lm8lTzg"
   },
   "source": [
    "### Importing libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "vP6d6JUShgqo"
   },
   "outputs": [],
   "source": [
    "import openai\n",
    "import logging\n",
    "import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index.core import SimpleDirectoryReader, Document, StorageContext\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.vector_stores.lancedb import LanceDBVectorStore\n",
    "import textwrap\n",
    "\n",
    "openai.api_key = openai_api_key\n",
    "assert openai.models.list() is not None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8eKRYd2F7v5n"
   },
   "source": [
    "### Download the data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "l0ezDr7suAf_"
   },
   "outputs": [],
   "source": [
    "! mkdir -p 'data/paul_graham/'\n",
    "! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'\n",
    "! ls 'data/paul_graham/'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HJf8xZmX8VJC"
   },
   "source": [
    "Load the documents stored in the data/paul_graham/ using the SimpleDirectoryReader:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5aljyqpUiViE"
   },
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"data/paul_graham/\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id, \"Document Hash:\", documents[0].hash)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IiM4DJvC_2dV"
   },
   "source": [
    "### Store data in LanceDB Cloud\n",
    "\n",
    "Let's connect to LanceDB so we can store our documents, It requires 0 setup !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GV77SSi-AK0v"
   },
   "outputs": [],
   "source": [
    "uri = \"db://\" + project_slug\n",
    "table_name = \"llamaindex_vectorstore\"  #optional, default table name is \"vectors\" \n",
    "\n",
    "vector_store = LanceDBVectorStore( \n",
    "    uri=uri, # your remote DB URI\n",
    "    api_key=\"sk_..\", # lancedb cloud api key\n",
    "    region=\"your-region\" # the region you configured\n",
    "    ...\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sZOUxfqzXr1m"
   },
   "source": [
    "### Create an index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "4nDltKClAhhU"
   },
   "outputs": [],
   "source": [
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "\n",
    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xoS-WKXMXvvR"
   },
   "source": [
    "And thats it! We're all setup. The next step is to run some queries, let's try a few:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7SKSlyq2iwpK"
   },
   "source": [
    "### Query the index\n",
    "We can now ask questions using the created index. Filtering can be enabled via `MetadataFilters` or use native lance `where` clause."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5eb6419b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "from llama_index.core.vector_stores import (\n",
    "    MetadataFilters,\n",
    "    FilterOperator,\n",
    "    FilterCondition,\n",
    "    MetadataFilter,\n",
    ")\n",
    "\n",
    "date = datetime.today().strftime(\"%Y-%m-%d\")\n",
    "query_filters = MetadataFilters(\n",
    "    filters=[\n",
    "        MetadataFilter(\n",
    "            key=\"creation_date\",\n",
    "            operator=FilterOperator.EQ,\n",
    "            value=date,  # using current date as the latest data is scraped\n",
    "        ),\n",
    "        MetadataFilter(key=\"file_size\", value=75040, operator=FilterOperator.GT),\n",
    "    ],\n",
    "    condition=FilterCondition.AND,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Viaweb charged $100 a month for a small store and $300 a month for a big one.\n",
       "metadata - ..."
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    filters=query_filters,\n",
    ")\n",
    "\n",
    "response = query_engine.query(\"How much did Viaweb charge per month?\")\n",
    "print(response)\n",
    "print(\"metadata -\", response.metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c1c6c73",
   "metadata": {},
   "source": [
    "Let's use LanceDB filters(SQL like) directly via the `where` clause :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "lance_filter = \"metadata.file_name = 'paul_graham_essay.txt' \"\n",
    "retriever = index.as_retriever(vector_store_kwargs={\"where\": lance_filter})\n",
    "response = retriever.retrieve(\"What did the author do growing up?\")\n",
    "print(response[0].get_content())\n",
    "print(\"metadata -\", response[0].metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sZOUxfqzXr1m"
   },
   "source": [
    "### Append data to the index \n",
    "You can also add data to an existing index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "069fc099",
   "metadata": {},
   "outputs": [],
   "source": [
    "del index\n",
    "\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    [Document(text=\"The sky is purple in Portland, Maine\")],\n",
    "    uri=\"/tmp/new_dataset\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5cffcfe",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Portland, Maine\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Where is the sky purple?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
